Low-Overhead Barrier Synchronization for OpenMP-like Parallelism on the Single-Chip Cloud Computer

نویسندگان

  • Hayder Al-Khalissi
  • Andrea Marongiu
  • Mladen Berekovic
چکیده

To simplify program development for the Singlechip Cloud Computer (SCC) it is desirable to have highlevel, shared memory-based parallel programming abstractions (e.g., OpenMP-like programming model). Central to any similar programming model are barrier synchronization primitives, to coordinate the work of parallel threads. To allow high-level barrier constructs to deliver good performance, we need an efficient implementation of the underlying synchronization algorithm. In this work, we consider some of the most widely used approaches for barrier synchronization on the SCC, which constitutes the basis for implementing OpenMP-like parallelism. In particular, we consider optimizations that leverage SCC-specific hardware support for synchronization, or its explicitly-managed memory buffers. We provide a detailed evaluation of the performance achieved by different approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An approach for Supporting OpenMP on the Intel SCC

The advent of the Single-chip Cloud Computer (SCC) chip in the many-core realm imposes challenges to programmers. From a programmer’s perspective is desirable to use the shared memory paradigm, employing high-level parallel programming abstractions such as OpenMP. In this paper we discuss our ongoing efforts to support OpenMP on SCC. Specifically, we focus on the following three key aspects in ...

متن کامل

Expressing DOACROSS Loop Dependences in OpenMP

OpenMP is a widely used programming standard for a broad range of parallel systems. In the OpenMP programming model, synchronization points are specified by implicit or explicit barrier operations within a parallel region. However, certain classes of computations, such as stencil algorithms, can be supported with better synchronization efficiency and data locality when using doacross parallelis...

متن کامل

Performance Characteristics of OpenMP Language Constructs on a Many-core-on-a-chip Architecture

Recent emerging many-core-on-a-chip architectures present massive on-chip parallelism through hardware support for multithreading. In order to achieve fast development of parallel applications that exploit this massive intrachip parallelism to achieve highly sustainable performance, suitable programming models are needed. OpenMP, the industry de facto standard for writing parallel programs on s...

متن کامل

Dependence-Based Code Generation for a CELL Processor

Obtaining high performance on the STI CELL processor requires substantial programming effort because its architectural features must be explicitly managed, with separate codes required for two different types of cores (PPE and SPE). Research at IBM has developed a single source-image compiler for CELL that performs vectorization but uses OpenMP to specify cross-core parallelism. In this paper, ...

متن کامل

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-core processors based commodity servers recently become building blocks for high performance computing Linux clusters. The multi-core processors deliver better performance-to-cost ratios relative to their single-core predecessors through on-chip multi-threading. However, they present challenges in developing high performance multi-threaded code. In this paper we study the performance of d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012